Putting a Value on Comparable Data
نویسنده
چکیده
Machine translation began in 1947 with an influential memo by Warren Weaver. In that memo, Weaver noted that human code-breakers could transform ciphers into natural language (e.g., into Turkish) without access to parallel ciphertext/plaintext data, and without knowing the plaintext language’s syntax and semantics. Simple wordand letter-statistics seemed to be enough for the task. Weaver then predicted that such statistical methods could also solve a tougher problem, namely language translation. This raises the question: can sufficient translation knowledge be derived from comparable (non-parallel) data? In this talk, I will discuss initial work in treating foreign language as a code for English, where we assume the code to involve both word substitutions and word transpositions. In doing so, I will quantitatively estimate the value of non-parallel data, versus parallel data, in terms of end-to-end accuracy of trained translation systems. Because we still know very little about solving word-based codes, I will also describe successful techniques and lessons from the realm of letter-based ciphers, where the nonparallel resources are (1) enciphered text, and (2) unrelated plaintext. As an example, I will describe how we decoded the Copiale cipher with limited “computer-like” knowledge of the plaintext language. The talk will wrap up with challenges in exploiting comparable data at all levels: letters, words, phrases, syntax, and semantics.
منابع مشابه
UK and Twenty Comparable Countries GDP-Expenditure-on-Health 1980-2013: The Historic and Continued Low Priority of UK Health-Related Expenditure
It is well-established that for a considerable period the United Kingdom has spent proportionally less of its gross domestic product (GDP) on health-related services than almost any other comparable country. Average European spending on health (as a % of GDP) in the period 1980 to 2013 has been 19% higher than the United Kingdom, indicating that comparable countries give far greater fiscal prio...
متن کاملIdentification of the Patient Requirements Using Lean Six Sigma and Data Mining
Lean health care is one of new managing approaches putting the patient at the core of each change. Lean construction is based on visualization for understanding and prioritizing imporvments. By using only visualization techniques, so much important information could be missed. In order to prioritize and select improvements, it’s essential to integrate new analysis tools to achieve a good unders...
متن کاملThe Effect of Enhanced Expectancies on Performance and learning Golf Putting with an Emphasis on Self-efficacy and Perception Competence
Sport psychology and its role is important effect on successful in physical education. One of following a field of sport psychology that affects performance, is self-efficacy. The aim of this study was to investigate the effect of raising expected impact on performance and learning with an emphasis on self-efficacy and competence Put Golf is perceived. In terms of content this study was applica...
متن کاملتعیین و ارزیابی نسبت تبخیر و تعرق گیاه مرجع یونجه(ETr) به گیاه مرجع چمن (ETo) در دشت شهرکرد
It is necessary that ETr (Alfalfa-reference evapotranspiration) values be converted to ETo (Grass-reference evapotranspiration) or vice versa. The main objective of this study was to develop ETr to ETo ratios (Kr values) for a growing season in Shahrekord plain, Shahrekord, Iran. Mean monthly and total (growing season) values of Kr were calculated based on 185 daily ET data set in Chaharthakhte...
متن کاملThe Investigation Effective Value Factors in Elderly Situation Within Their Family
Objectives: The developing issue of oldness in industrious and the third world countries requires so much precision. Oldness, as one of the most important periods of life, constitutes part of the human's life and is of great significance particularly in the family life of everybody. The aim of this study focuses on some effective parameters in the quality of adults' situation in their family to...
متن کامل